Do data analysis on games data set so as to understand the relationship between the columns and predict the future values.
This data was gotten from this site:
Those are packages that are in the tidymodels which we are going to use in this project.
Rows: 1,512
Columns: 14
$ X_1 <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, …
$ Title <chr> "Elden Ring", "Hades", "The Legend of Zelda: Breath of the Wild", "Undertale", "Hollow Knight", "Minecraft", "Omori", "Metroid Dread", "Among Us", "NieR: Automata", "Persona 5 Royal", "Stray", "God of War", "Portal 2", "Bl…
$ Release_Date <chr> "Feb 25, 2022", "Dec 10, 2019", "Mar 03, 2017", "Sep 15, 2015", "Feb 24, 2017", "Nov 18, 2011", "Dec 25, 2020", "Oct 07, 2021", "Jun 15, 2018", "Feb 23, 2017", "Oct 31, 2019", "Jul 19, 2022", "Apr 20, 2018", "Apr 18, 2011"…
$ Team <chr> "['Bandai Namco Entertainment', 'FromSoftware']", "['Supergiant Games']", "['Nintendo', 'Nintendo EPD Production Group No. 3']", "['tobyfox', '8-4']", "['Team Cherry']", "['Mojang Studios']", "['OMOCAT', 'PLAYISM']", "['Ni…
$ Rating <dbl> 4.5, 4.3, 4.4, 4.2, 4.4, 4.3, 4.2, 4.3, 3.0, 4.3, 4.4, 3.7, 4.2, 4.4, 4.5, 4.2, 4.4, 4.4, 4.1, 4.2, 3.7, 4.3, 4.1, 3.8, 3.3, 4.4, 4.4, 4.2, 4.6, 4.1, 4.2, 4.2, 4.1, 4.1, 4.1, 4.4, 4.2, 4.2, 2.6, 4.2, 3.9, 4.3, 4.3, 4.6, 4.…
$ Times_Listed <chr> "3.9K", "2.9K", "4.3K", "3.5K", "3K", "2.3K", "1.6K", "2.1K", "867", "2.9K", "2.7K", "1.5K", "2.9K", "2.9K", "3.4K", "2.8K", "2.7K", "2.9K", "2K", "2.9K", "1.6K", "926", "2.1K", "2.1K", "1.5K", "1.7K", "1.4K", "1.6K", "1.1…
$ Number_of_Reviews <chr> "3.9K", "2.9K", "4.3K", "3.5K", "3K", "2.3K", "1.6K", "2.1K", "867", "2.9K", "2.7K", "1.5K", "2.9K", "2.9K", "3.4K", "2.8K", "2.7K", "2.9K", "2K", "2.9K", "1.6K", "926", "2.1K", "2.1K", "1.5K", "1.7K", "1.4K", "1.6K", "1.1…
$ Genres <chr> "['Adventure', 'RPG']", "['Adventure', 'Brawler', 'Indie', 'RPG']", "['Adventure', 'RPG']", "['Adventure', 'Indie', 'RPG', 'Turn Based Strategy']", "['Adventure', 'Indie', 'Platform']", "['Adventure', 'Simulator']", "['Adv…
$ Summary <chr> "Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and spells. Rise, Tarnished, and be guided by grace to brandish the power of the Elden Ring and become an Elden Lord in the Land…
$ Reviews <chr> "[\"The first playthrough of elden ring is one of the best eperiences gaming can offer you but after youve explored everything in the open world and you've experienced all of the surprises you lose motivation to go explori…
$ Plays <chr> "17K", "21K", "30K", "28K", "21K", "33K", "7.2K", "9.2K", "25K", "18K", "12K", "7.7K", "21K", "29K", "17K", "20K", "15K", "19K", "28K", "25K", "9.1K", "3K", "14K", "30K", "13K", "5.3K", "3.9K", "5.9K", "6K", "21K", "19K", …
$ Playing <chr> "3.8K", "3.2K", "2.5K", "679", "2.4K", "1.8K", "1.1K", "759", "470", "1.1K", "2.3K", "801", "1.1K", "471", "1.1K", "1.2K", "1.8K", "1.7K", "244", "710", "1.6K", "866", "492", "829", "1.5K", "801", "795", "955", "1.2K", "57…
$ Backlogs <chr> "4.6K", "6.3K", "5K", "4.9K", "8.3K", "1.1K", "4.5K", "3.4K", "776", "6.2K", "5.1K", "2.5K", "4.8K", "3.9K", "5.6K", "5.9K", "6.4K", "5.5K", "2.7K", "2.9K", "2.5K", "1.5K", "4.2K", "3.2K", "4.7K", "2K", "2.1K", "2.5K", "5K…
$ Wishlist <chr> "4.8K", "3.6K", "2.6K", "1.8K", "2.3K", "230", "3.8K", "3.3K", "126", "3.6K", "3K", "3.4K", "2.6K", "1.2K", "3.3K", "2K", "2K", "2.9K", "1.1K", "2K", "2.1K", "2K", "2K", "664", "2.9K", "3.3K", "2.2K", "3.1K", "2.7K", "2.2K…
Our data set has 1,512 and 14 columns
Most of the coulmns in our data set are character data type.
There is no character data type column.
X_1 Title Release_Date Team Rating Times_Listed Number_of_Reviews Genres Summary Reviews Plays Playing Backlogs
0 0 0 1 13 0 0 0 1 0 0 0 0
Wishlist
0
The only coulmns with missing data are Team,Rating and Summary.
X_1 Title Release_Date Team Rating Times_Listed Number_of_Reviews Genres Summary Reviews Plays Playing Backlogs
0 0 0 1 13 0 0 0 1 0 0 0 0
Wishlist
0
Now there is no missing data.
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! object 'Number of Reviews' not found
Most games had a review of 4 and relatively they were around 500.
There are many outliers in the boxplot of game ratings.
The game ratings has normal distribution.
## Non-Visual Exploratory Data Analysis
From the two tables we can say our data is not biased. ## Structure of our data set
Rows: 1,512
Columns: 14
$ X_1 <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, …
$ Title <fct> "Elden Ring", "Hades", "The Legend of Zelda: Breath of the Wild", "Undertale", "Hollow Knight", "Minecraft", "Omori", "Metroid Dread", "Among Us", "NieR: Automata", "Persona 5 Royal", "Stray", "God of War", "Portal 2", "Bl…
$ Release_Date <fct> "Feb 25, 2022", "Dec 10, 2019", "Mar 03, 2017", "Sep 15, 2015", "Feb 24, 2017", "Nov 18, 2011", "Dec 25, 2020", "Oct 07, 2021", "Jun 15, 2018", "Feb 23, 2017", "Oct 31, 2019", "Jul 19, 2022", "Apr 20, 2018", "Apr 18, 2011"…
$ Team <fct> "['Bandai Namco Entertainment', 'FromSoftware']", "['Supergiant Games']", "['Nintendo', 'Nintendo EPD Production Group No. 3']", "['tobyfox', '8-4']", "['Team Cherry']", "['Mojang Studios']", "['OMOCAT', 'PLAYISM']", "['Ni…
$ Rating <dbl> 4.5, 4.3, 4.4, 4.2, 4.4, 4.3, 4.2, 4.3, 3.0, 4.3, 4.4, 3.7, 4.2, 4.4, 4.5, 4.2, 4.4, 4.4, 4.1, 4.2, 3.7, 4.3, 4.1, 3.8, 3.3, 4.4, 4.4, 4.2, 4.6, 4.1, 4.2, 4.2, 4.1, 4.1, 4.1, 4.4, 4.2, 4.2, 2.6, 4.2, 3.9, 4.3, 4.3, 4.6, 4.…
$ Times_Listed <fct> 3.9K, 2.9K, 4.3K, 3.5K, 3K, 2.3K, 1.6K, 2.1K, 867, 2.9K, 2.7K, 1.5K, 2.9K, 2.9K, 3.4K, 2.8K, 2.7K, 2.9K, 2K, 2.9K, 1.6K, 926, 2.1K, 2.1K, 1.5K, 1.7K, 1.4K, 1.6K, 1.1K, 2.5K, 2.4K, 1.5K, 2.6K, 2.5K, 1.9K, 2.3K, 2.9K, 1.9K, …
$ Number_of_Reviews <fct> 3.9K, 2.9K, 4.3K, 3.5K, 3K, 2.3K, 1.6K, 2.1K, 867, 2.9K, 2.7K, 1.5K, 2.9K, 2.9K, 3.4K, 2.8K, 2.7K, 2.9K, 2K, 2.9K, 1.6K, 926, 2.1K, 2.1K, 1.5K, 1.7K, 1.4K, 1.6K, 1.1K, 2.5K, 2.4K, 1.5K, 2.6K, 2.5K, 1.9K, 2.3K, 2.9K, 1.9K, …
$ Genres <fct> "['Adventure', 'RPG']", "['Adventure', 'Brawler', 'Indie', 'RPG']", "['Adventure', 'RPG']", "['Adventure', 'Indie', 'RPG', 'Turn Based Strategy']", "['Adventure', 'Indie', 'Platform']", "['Adventure', 'Simulator']", "['Adv…
$ Summary <fct> "Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and spells. Rise, Tarnished, and be guided by grace to brandish the power of the Elden Ring and become an Elden Lord in the Land…
$ Reviews <fct> "[\"The first playthrough of elden ring is one of the best eperiences gaming can offer you but after youve explored everything in the open world and you've experienced all of the surprises you lose motivation to go explori…
$ Plays <fct> 17K, 21K, 30K, 28K, 21K, 33K, 7.2K, 9.2K, 25K, 18K, 12K, 7.7K, 21K, 29K, 17K, 20K, 15K, 19K, 28K, 25K, 9.1K, 3K, 14K, 30K, 13K, 5.3K, 3.9K, 5.9K, 6K, 21K, 19K, 6.7K, 21K, 25K, 18K, 14K, 15K, 13K, 14K, 2.2K, 9.9K, 21K, 16K,…
$ Playing <fct> 3.8K, 3.2K, 2.5K, 679, 2.4K, 1.8K, 1.1K, 759, 470, 1.1K, 2.3K, 801, 1.1K, 471, 1.1K, 1.2K, 1.8K, 1.7K, 244, 710, 1.6K, 866, 492, 829, 1.5K, 801, 795, 955, 1.2K, 577, 851, 880, 463, 1.2K, 1.2K, 919, 1.1K, 1.5K, 2.7K, 419, 3…
$ Backlogs <fct> 4.6K, 6.3K, 5K, 4.9K, 8.3K, 1.1K, 4.5K, 3.4K, 776, 6.2K, 5.1K, 2.5K, 4.8K, 3.9K, 5.6K, 5.9K, 6.4K, 5.5K, 2.7K, 2.9K, 2.5K, 1.5K, 4.2K, 3.2K, 4.7K, 2K, 2.1K, 2.5K, 5K, 2.9K, 4.3K, 4.1K, 2.5K, 1.1K, 4.5K, 4.8K, 5K, 5.2K, 1.3…
$ Wishlist <fct> 4.8K, 3.6K, 2.6K, 1.8K, 2.3K, 230, 3.8K, 3.3K, 126, 3.6K, 3K, 3.4K, 2.6K, 1.2K, 3.3K, 2K, 2K, 2.9K, 1.1K, 2K, 2.1K, 2K, 2K, 664, 2.9K, 3.3K, 2.2K, 3.1K, 2.7K, 2.2K, 2.2K, 3.7K, 775, 801, 2.6K, 3.4K, 1.5K, 2.2K, 280, 2.2K, …
Our data set has just numeric and factor data type columns as data cleaning. ## Statistical summary of our data set
| Name | data |
| Number of rows | 1512 |
| Number of columns | 14 |
| _______________________ | |
| Column type frequency: | |
| factor | 12 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| Title | 0 | 1 | FALSE | 1099 | Doo: 7, Dea: 5, Res: 5, Sha: 5 |
| Release_Date | 0 | 1 | FALSE | 987 | Nov: 8, Nov: 7, Jun: 6, Jun: 5 |
| Team | 1 | 1 | FALSE | 764 | [’C: 35, [’S: 31, [’N: 19, [’N: 19 |
| Times_Listed | 0 | 1 | FALSE | 606 | 1.1: 46, 1.2: 39, 1.3: 34, 1.5: 27 |
| Number_of_Reviews | 0 | 1 | FALSE | 606 | 1.1: 46, 1.2: 39, 1.3: 34, 1.5: 27 |
| Genres | 0 | 1 | FALSE | 255 | [’A: 154, [’A: 107, [’A: 82, [’S: 72 |
| Summary | 1 | 1 | FALSE | 1112 | Min: 4, ’Da: 3, A 2: 3, A 3: 3 |
| Reviews | 0 | 1 | FALSE | 1117 | []: 12, [’A: 3, [’A: 3, [’A: 3 |
| Plays | 0 | 1 | FALSE | 258 | 12K: 50, 13K: 40, 14K: 39, 1.6: 33 |
| Playing | 0 | 1 | FALSE | 396 | 1.1: 24, 1.2: 17, 22: 14, 2: 13 |
| Backlogs | 0 | 1 | FALSE | 544 | 1.5: 52, 1.1: 43, 1.3: 42, 1.8: 33 |
| Wishlist | 0 | 1 | FALSE | 573 | 1.3: 41, 1.2: 39, 1.1: 30, 1.4: 30 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| X_1 | 0 | 1.00 | 755.50 | 436.62 | 0.0 | 377.75 | 755.5 | 1133.25 | 1511.0 | ▇▇▇▇▇ |
| Rating | 13 | 0.99 | 3.72 | 0.53 | 0.7 | 3.40 | 3.8 | 4.10 | 4.8 | ▁▁▂▇▆ |
These are games with rating of 4 and above.
Here are some data wrangling done in the above: * Filling missing data with mode. * Convert all character columns to factor.
Error in `mutate()`:
ℹ In argument: `across(...)`.
Caused by error in `across()`:
! Can't subset columns that don't exist.
✖ Column `Times Listed` doesn't exist.
Error in `$<-.data.frame`(`*tmp*`, `Release Date`, value = structure(numeric(0), class = "Date")): replacement has 0 rows, data has 1512
Error in `$<-.data.frame`(`*tmp*`, Year, value = numeric(0)): replacement has 0 rows, data has 1512
Error in `$<-.data.frame`(`*tmp*`, Month, value = structure(integer(0), levels = c("Jan", : replacement has 0 rows, data has 1512
Error in `select()`:
! Can't subset columns that don't exist.
✖ Column `...1` doesn't exist.
Error in `[.data.frame`(data_filled, , c("Times.Listed", "Number.of.Reviews", : undefined columns selected
[1] NA NA NA NA NA NA
Rows: 1,512
Columns: 16
$ X_1 <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, …
$ Title <int> 252, 377, 973, 1041, 404, 588, 644, 575, 19, 628, 677, 878, 353, 724, 84, 131, 1083, 751, 723, 902, 705, 396, 559, 364, 168, 356, 1078, 460, 214, 528, 176, 627, 894, 915, 166, 796, 291, 675, 344, 822, 764, 899, 270, 648, 7…
$ Release_Date <int> 281, 166, 482, 919, 278, 703, 205, 769, 441, 275, 871, 384, 33, 23, 551, 330, 507, 839, 776, 846, 341, 332, 261, 926, 162, 655, 411, 556, 572, 892, 552, 38, 458, 156, 980, 542, 348, 440, 976, 848, 581, 625, 811, 617, 331, …
$ Team <int> 78, 650, 449, 690, 659, 378, 494, 440, 297, 509, 71, 116, 603, 723, 244, 224, 552, 656, 723, 464, 430, 657, 329, 543, 142, 603, 383, 276, 761, 300, 78, 696, 443, 605, 642, 239, 616, 478, 372, 549, 133, 466, 489, 376, 121, …
$ Rating <dbl> 4.5, 4.3, 4.4, 4.2, 4.4, 4.3, 4.2, 4.3, 3.0, 4.3, 4.4, 3.7, 4.2, 4.4, 4.5, 4.2, 4.4, 4.4, 4.1, 4.2, 3.7, 4.3, 4.1, 3.8, 3.3, 4.4, 4.4, 4.2, 4.6, 4.1, 4.2, 4.2, 4.1, 4.1, 4.1, 4.4, 4.2, 4.2, 2.6, 4.2, 3.9, 4.3, 4.3, 4.6, 4.…
$ Times_Listed <int> 185, 100, 270, 184, 269, 94, 7, 92, 556, 100, 98, 6, 100, 100, 183, 99, 98, 100, 182, 100, 7, 582, 92, 92, 6, 8, 5, 7, 2, 96, 95, 6, 97, 96, 10, 94, 100, 10, 451, 533, 9, 98, 182, 9, 96, 9, 2, 8, 8, 96, 93, 182, 9, 10, 8, …
$ Number_of_Reviews <int> 185, 100, 270, 184, 269, 94, 7, 92, 556, 100, 98, 6, 100, 100, 183, 99, 98, 100, 182, 100, 7, 582, 92, 92, 6, 8, 5, 7, 2, 96, 95, 6, 97, 96, 10, 94, 100, 10, 451, 533, 9, 98, 182, 9, 96, 9, 2, 8, 8, 96, 93, 182, 9, 10, 8, …
$ Genres <int> 126, 22, 126, 70, 56, 132, 70, 92, 207, 166, 125, 80, 34, 83, 126, 56, 32, 116, 215, 92, 125, 24, 35, 129, 116, 38, 126, 92, 72, 38, 126, 126, 92, 180, 148, 38, 125, 119, 126, 80, 129, 92, 231, 66, 129, 246, 125, 167, 199,…
$ Summary <int> 294, 57, 920, 69, 20, 599, 83, 508, 513, 624, 110, 555, 401, 761, 105, 422, 898, 718, 1056, 330, 952, 133, 252, 408, 234, 402, 86, 509, 261, 822, 237, 623, 887, 541, 233, 303, 351, 108, 393, 781, 733, 23, 481, 646, 721, 25…
$ Reviews <int> 1056, 141, 705, 627, 1079, 445, 1053, 253, 956, 263, 758, 543, 214, 714, 887, 765, 716, 517, 161, 816, 673, 373, 704, 522, 952, 61, 1011, 404, 810, 528, 679, 478, 769, 577, 202, 948, 270, 441, 958, 611, 706, 693, 1058, 591…
$ Plays <int> 30, 55, 90, 73, 55, 96, 184, 235, 66, 35, 18, 189, 55, 76, 30, 52, 24, 39, 73, 66, 234, 107, 23, 90, 21, 136, 87, 142, 182, 55, 39, 166, 55, 66, 35, 23, 24, 21, 23, 43, 242, 55, 29, 189, 24, 35, 108, 21, 52, 21, 217, 30, 2…
$ Playing <int> 180, 179, 106, 329, 105, 8, 3, 350, 266, 3, 104, 361, 3, 267, 3, 4, 8, 7, 141, 338, 6, 372, 275, 364, 5, 361, 358, 391, 4, 300, 370, 377, 261, 4, 4, 383, 3, 5, 107, 242, 205, 263, 349, 325, 304, 290, 102, 314, 177, 268, 16…
$ Backlogs <int> 219, 360, 356, 222, 464, 2, 218, 148, 456, 359, 293, 75, 221, 152, 296, 297, 361, 295, 77, 79, 75, 6, 216, 146, 220, 144, 71, 75, 356, 79, 217, 215, 75, 2, 218, 221, 356, 294, 4, 10, 78, 145, 356, 221, 150, 291, 148, 356, …
$ Wishlist <int> 267, 192, 105, 8, 102, 132, 194, 190, 34, 192, 265, 191, 105, 2, 190, 187, 187, 108, 1, 187, 100, 187, 187, 434, 108, 190, 101, 189, 106, 101, 101, 193, 488, 501, 105, 191, 5, 101, 173, 101, 107, 2, 4, 189, 9, 4, 105, 102,…
$ `Total Plays` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Average_Rating <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
That is our new data frame after data wrangling.
These data show random variation; There are no patterns or cycles.